智能论文笔记

Visualizing Graph Neural Networks with CorGIE: Corresponding a Graph to Its Embedding

Zipeng Liu , Yang Wang , Jürgen Bernard , Tamara Munzner

分类：机器学习

2021-06-24

图形神经网络（GNNS）是一类强大的机器学习工具，可以模拟节点关系，用于制定节点或链接的预测。GNN开发人员依靠预测的定量度量来评估GNN，但类似于许多其他神经网络，他们很难了解GNN是否真正学习如预期的图形的特征。我们提出了一种对应于其节点嵌入（AKA潜像）的输入图的方法，稍后用于预测的GNN的公共组件。我们摘要数据和任务，并开发一个名为corgie的交互式多视图界面，以实例化抽象。作为Corgie的关键功能，我们提出了K-Hop图布局，以显示啤酒花和它们的聚类结构中的拓扑邻居。为了评估Corgie的功能和可用性，我们展示了如何在两种使用情况下使用Corgie，并使用五个GNN专家进行案例研究。

translated by 谷歌翻译

Segmentation based tracking of cells in 2D+time microscopy images of macrophages

Seol Ah Park , Tamara Sipka , Zuzana Kriva , George Lutfalla , Mai Nguyen-Chi , Karol Mikula

分类：计算机视觉

2023-01-02

The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface segmentation) method. Next, the partial trajectories for cells overlapping in the temporal direction are extracted in the segmented images. Finally, the extracted trajectories are linked by considering their direction of movement. The segmented images and the obtained trajectories from the proposed method are compared with those of the semi-automatic segmentation and manual tracking. The proposed tracking achieved 97.4% of accuracy for macrophage data under challenging situations, feeble fluorescent intensity, irregular shapes, and motion of macrophages. We expect that the automatically extracted trajectories of macrophages can provide pieces of evidence of how macrophages migrate depending on their polarization modes in the situation, such as during wound healing.

translated by 谷歌翻译

Are you using test log-likelihood correctly?

Sameer K. Deshpande , Soumya Ghosh , Tin D. Nguyen , Tamara Broderick

分类： (统计)机器学习 | 机器学习

2022-12-01

Test log-likelihood is commonly used to compare different models of the same data and different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on other distributional quantities like means; and (ii) that approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations.

translated by 谷歌翻译

Scientific and Creative Analogies in Pretrained Language Models

Tamara Czinczoll , Helen Yannakoudakis , Pushkar Mishra , Ekaterina Shutova

分类：自然语言处理 | 机器学习

2022-11-28

This paper examines the encoding of analogy in large-scale pretrained language models, such as BERT and GPT-2. Existing analogy datasets typically focus on a limited set of analogical relations, with a high similarity of the two domains between which the analogy holds. As a more realistic setup, we introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy dataset containing systematic mappings of multiple attributes and relational structures across dissimilar domains. Using this dataset, we test the analogical reasoning capabilities of several widely-used pretrained language models (LMs). We find that state-of-the-art LMs achieve low performance on these complex analogy tasks, highlighting the challenges still posed by analogy understanding.

translated by 谷歌翻译

Linear Classification of Neural Manifolds with Correlated Variability

Albert J. Wakhloo , Tamara J. Sussman , SueYeon Chung

分类：神经与进化计算 | (统计)机器学习

2022-11-27

Understanding how the statistical and geometric properties of neural activations relate to network performance is a key problem in theoretical neuroscience and deep learning. In this letter, we calculate how correlations between object representations affect the capacity, a measure of linear separability. We show that for spherical object manifolds, introducing correlations between centroids effectively pushes the spheres closer together, while introducing correlations between the spheres' axes effectively shrinks their radii, revealing a duality between neural correlations and geometry. We then show that our results can be used to accurately estimate the capacity with real neural data.

translated by 谷歌翻译

CLAWSAT: Towards Both Robust and Accurate Code Models

Jinghan Jia , Shashank Srikant , Tamara Mitrovska , Chuang Gan , Shiyu Chang , Sijia Liu , Una-May O'Reilly

分类：机器学习

2022-11-21

We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further improve robustness and accuracy of the pre-trained code model. Built on the above two modules, we develop CLAWSAT, a novel self-supervised learning (SSL) framework for code by integrating $\underline{\textrm{CL}}$ with $\underline{\textrm{a}}$dversarial vie$\underline{\textrm{w}}$s (CLAW) with $\underline{\textrm{s}}$taggered $\underline{\textrm{a}}$dversarial $\underline{\textrm{t}}$raining (SAT). On evaluating three downstream tasks across Python and Java, we show that CLAWSAT consistently yields the best robustness and accuracy ($\textit{e.g.}$ 11$\%$ in robustness and 6$\%$ in accuracy on the code summarization task in Python). We additionally demonstrate the effectiveness of adversarial learning in CLAW by analyzing the characteristics of the loss landscape and interpretability of the pre-trained models.

translated by 谷歌翻译

Machine learning-accelerated chemistry modeling of protoplanetary disks

Grigorii V. Smirnov-Pinchukov , Tamara Molyarova , Dmitry A. Semenov , Vitaly V. Akimkin , Sierk van Terwisga , Riccardo Francheschi , Thomas Henning

分类：机器学习

2022-09-27

目标。借助（子）毫米观测值的大量分子发射数据和詹姆斯·韦伯（James Webb）空间望远镜红外光谱，访问原磁盘的化学成分的快进模型至关重要。方法。我们使用了热化学建模代码来生成各种多样的原行星磁盘模型。我们训练了一个最初的邻居（KNN）回归剂，以立即预测其他磁盘模型的化学反应。结果。我们表明，由于所采用的原行业磁盘模型中局部物理条件之间的相关性，可以仅使用一小部分物理条件来准确地重现化学反应。我们讨论此方法的不确定性和局限性。结论。所提出的方法可用于对线排放数据的贝叶斯拟合，以从观测值中检索磁盘属性。我们提出了在其他磁盘化学模型集上再现相同方法的管道。

translated by 谷歌翻译

Can an ML model plainly learn planar layouts?

Smon van Wageningen , Tamara Mchedlidze

分类：机器学习

2022-09-02

平面图纸往往在美学上令人愉悦。在此海报中，我们探讨了神经网络学习各种平面图类的能力。此外，我们还研究了该模型在概括平面性之外的有效性。我们发现该模型可以胜过某些图形类别的常规技术。但是，该模型似乎更容易受到数据中的随机性，并且似乎比预期的要稳健。

translated by 谷歌翻译

A general framework for the analysis of kernel-based tests

Tamara Fernández , Nicolás Rivera

分类： (统计)机器学习

2022-08-31

基于内核的测试提供了一个简单而有效的框架，该框架使用繁殖内核希尔伯特空间的理论设计非参数测试程序。在本文中，我们提出了新的理论工具，可用于在几种数据方案以及许多不同的测试问题中研究基于内核测试的渐近行为。与当前的方法不同，我们的方法避免使用冗长的$ u $和$ v $统计信息扩展并限制定理，该定理通常出现在文献中，并直接与希尔伯特空格上的随机功能合作。因此，我们的框架会导致对内核测试的简单明了的分析，只需要轻度的规律条件。此外，我们表明，通常可以通过证明我们方法所需的规律条件既足够又需要进行必要的规律条件来改进我们的分析。为了说明我们的方法的有效性，我们为有条件的独立性测试问题提供了一项新的内核测试，以及针对已知的基于内核测试的新分析。

translated by 谷歌翻译

Developing a Series of AI Challenges for the United States Department of the Air Force

Vijay Gadepally , Gregory Angelides , Andrei Barbu , Andrew Bowne , Laura J. Brattain , Tamara Broderick , Armando Cabrera , Glenn Carl , Ronisha Carter , Miriam Cha

分类：人工智能

2022-07-14

通过一系列联邦举措和命令，美国政府一直在努力确保美国在AI中的领导。这些广泛的战略文件影响了美国空军美国部（DAF）等组织。DAF-MIT AI加速器是DAF和MIT之间的一项计划，以弥合AI研究人员与DAF任务要求之间的差距。DAF-MIT AI加速器支持的几个项目正在开发公共挑战问题，这些问题解决了许多联邦AI研究的重点。这些挑战是通过公开可用的大型AI-Ready数据集，激励开源解决方案，并为可以激发进一步研究的双重使用技术创建需求信号，来针对优先事项。在本文中，我们描述了正在开发的这些公共挑战以及它们的应用如何促进科学进步。

translated by 谷歌翻译